膝关节骨关节炎(OA)是最常见的骨关节炎和伤残原因。软骨缺陷被认为是膝关节OA的主要表现,其通过磁共振成像(MRI)可见。因此,对膝关节软骨缺陷的早期检测和评估对于保护膝关节OA患者来说是重要的。通过这种方式,通过将卷积神经网络(CNNS)应用于膝关节MRI,已经在膝关节软骨缺陷评估中进行了许多尝试。然而,软骨的生理特性可能阻碍这种努力:软骨是薄的弯曲层,这意味着只有膝关节MRI中的一小部分体素可以有助于软骨缺陷评估;异构扫描方案进一步挑战CNN在临床实践中的可行性;基于CNN的膝关节软骨评估结果缺乏解释性。为了解决这些挑战,我们将软骨结构和外观模拟到膝关节MRI进入图表表示,该图表能够处理高度多样化的临床数据。然后,由软骨图表示指导,我们设计了一种具有自我关注机制的非欧几里德深度学习网络,提取本地和全局中的软骨功能,并通过可视化结果导出最终评估。我们的综合实验表明,该方法在膝关节软骨缺陷评估中产生了卓越的性能,以及其方便的可解释性3D可视化。
translated by 谷歌翻译
犹豫不决的模糊语言偏好关系(HFLPR)是有意义的,因为它为不确定性提供了意见表达的有效方法。为了提高HFLPR决策理论,本文介绍了一种基于可接受的一致性和共识测量与HFLPRS组决策的算法,涉及(1)定义犹豫不决的模糊语言几何一致性指数(HFLGCI)并提出过程对于HFLPR的一致性检查和不一致的不一致; (2)根据原始个人HFLPRS与整体完美HFLPR之间的相似性衡量组共识,然后建立共识的程序,确保包括决策者重量的确定。提出的两项程序的收敛性和单调性已被证明。进一步进行一些实验以研究所定义的HFLGCI的临界值,并进行比较分析以显示所提出的算法的有效性。给出了有关风险投资指导基金的绩效评估的案例,以说明所提出的算法的可用性。作为我们工作的应用,最终为决策者提供了在线决策门户,以利用所提出的算法来解决决策问题。
translated by 谷歌翻译
背景:2019年新型冠状病毒病(Covid-19)在世界范围内广泛传播,对人们的生活环境造成了巨大的威胁。目的:在计算断层扫描(CT)成像下,Covid-19病变的结构特征在不同情况下复杂且多样化。为了准确定位Covid-19病变并协助医生做出最好的诊断和治疗计划,在CT图像中为Covid-19病变分段提供了深度监督的集合学习网络。方法:考虑到大量CoVID-19 CT图像和相应的病变注释难以获得,采用转移学习策略来弥补缺点并减轻过度装备问题。基于现实,传统的单一深度学习框架难以有效提取Covid-19病变特征,这可能导致一些病变未被发现。为了克服这个问题,提出了一个深度监督的集合学习网络,与Covid-19病变分割的本地和全局特征相结合。结果:验证了该方法的性能在具有公共数据集的实验中验证。与手动注释相比,所提出的方法获得了0.7279的联盟(IOU)的高交叉点。结论:CT图像中的冠状病毒肺炎病变分割介绍了深度监督的集合学习网络。通过目视检查和定量评估验证了所提出的方法的有效性。实验结果表明,拟议的Mehtod在Covid-19病变细分中具有完美的性能。
translated by 谷歌翻译
手术场景细分对于促使机器人手术的认知援助至关重要。但是,以逐帧方式以像素为单位的注释视频是昂贵且耗时的。为了大大减轻标签负担,在这项工作中,我们从机器人手术视频中研究了半监督的场景细分,这实际上是必不可少的,但以前很少探索。我们考虑在等距采样下的临床上适当的注释情况。然后,我们提出了PGV-CL,这是一种新型的伪标签引导的跨视频对比学习方法,以增强场景分割。它有效地利用了未标记的数据来实现可信赖和全球模型的正则化,从而产生更具歧视性的特征表示。具体来说,对于可信赖的表示学习,我们建议合并伪标签以指导对选择,从而获得更可靠的代表对像素对比度。此外,我们将代表学习空间从以前的图像级扩展到交叉视频,该图像可以捕获全球语义以使学习过程受益。我们广泛评估了公共机器人手术数据集Edovis18和公共白内障数据集Cadis的方法。实验结果证明了我们方法的有效性,在不同的标签比下始终超过了最先进的半监督方法,甚至超过了10.1%标签的destovis18上的全面监督培训。
translated by 谷歌翻译
自动手术场景细分是促进现代手术剧院认知智能的基础。以前的作品依赖于常规的聚合模块(例如扩张的卷积,卷积LSTM),仅利用局部环境。在本文中,我们提出了一个新颖的框架STSWINCL,该框架通过逐步捕获全球环境来探讨互补的视频内和访问间关系以提高细分性能。我们首先开发了层次结构变压器,以捕获视频内关系,其中包括来自邻居像素和以前的帧的富裕空间和时间提示。提出了一个联合时空窗口移动方案,以有效地将这两个线索聚集到每个像素嵌入中。然后,我们通过像素到像素对比度学习探索视频间的关系,该学习很好地结构了整体嵌入空间。开发了一个多源对比度训练目标,可以将视频中的像素嵌入和基础指导分组,这对于学习整个数据的全球属性至关重要。我们在两个公共外科视频基准测试中广泛验证了我们的方法,包括Endovis18 Challenge和Cadis数据集。实验结果证明了我们的方法的有希望的性能,这始终超过了先前的最新方法。代码可在https://github.com/yuemingjin/stswincl上找到。
translated by 谷歌翻译
在传统的视觉问题(VQG)中,大多数图像具有多个概念(例如,对象和类别),可以生成问题,但培训模型以模仿培训数据中给出的任意选择概念。这使得训练困难并且还造成评估问题 - 对于大多数图像而言,存在多个有效问题,但人类参考资料只捕获一个或多个。我们呈现指导视觉问题 - VQG的变体,它根据对问题类型和应该探索的对象的期望来解决基于分类信息的问题生成器。我们提出了两个变体:(i)明确指导的模型,使演员(人机或自动化)能够选择哪些对象和类别来生成问题; (ii)基于离散潜在变量的基于离散潜变量,了解了一个隐式导游的模型,该模型将了解条件的哪些对象和类别。在答案类别增强VQA数据集上评估所提出的模型,我们的定量结果显示了对现有技术的大大改进(超过9bleu-4增加)。人类评估验证指导有助于生成语法相干的问题,并与给定的图像和对象相关。
translated by 谷歌翻译
Masked image modeling (MIM) performs strongly in pre-training large vision Transformers (ViTs). However, small models that are critical for real-world applications cannot or only marginally benefit from this pre-training approach. In this paper, we explore distillation techniques to transfer the success of large MIM-based pre-trained models to smaller ones. We systematically study different options in the distillation framework, including distilling targets, losses, input, network regularization, sequential distillation, etc, revealing that: 1) Distilling token relations is more effective than CLS token- and feature-based distillation; 2) An intermediate layer of the teacher network as target perform better than that using the last layer when the depth of the student mismatches that of the teacher; 3) Weak regularization is preferred; etc. With these findings, we achieve significant fine-tuning accuracy improvements over the scratch MIM pre-training on ImageNet-1K classification, using all the ViT-Tiny, ViT-Small, and ViT-base models, with +4.2%/+2.4%/+1.4% gains, respectively. Our TinyMIM model of base size achieves 52.2 mIoU in AE20K semantic segmentation, which is +4.1 higher than the MAE baseline. Our TinyMIM model of tiny size achieves 79.6% top-1 accuracy on ImageNet-1K image classification, which sets a new record for small vision models of the same size and computation budget. This strong performance suggests an alternative way for developing small vision Transformer models, that is, by exploring better training methods rather than introducing inductive biases into architectures as in most previous works. Code is available at https://github.com/OliverRensu/TinyMIM.
translated by 谷歌翻译
In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.
translated by 谷歌翻译
Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.
translated by 谷歌翻译
Blind image quality assessment (BIQA) remains challenging due to the diversity of distortion and image content variation, which complicate the distortion patterns crossing different scales and aggravate the difficulty of the regression problem for BIQA. However, existing BIQA methods often fail to consider multi-scale distortion patterns and image content, and little research has been done on learning strategies to make the regression model produce better performance. In this paper, we propose a simple yet effective Progressive Multi-Task Image Quality Assessment (PMT-IQA) model, which contains a multi-scale feature extraction module (MS) and a progressive multi-task learning module (PMT), to help the model learn complex distortion patterns and better optimize the regression issue to align with the law of human learning process from easy to hard. To verify the effectiveness of the proposed PMT-IQA model, we conduct experiments on four widely used public datasets, and the experimental results indicate that the performance of PMT-IQA is superior to the comparison approaches, and both MS and PMT modules improve the model's performance.
translated by 谷歌翻译